Goto

Collaborating Authors

 Midland County


Supplementary Material for DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation

Neural Information Processing Systems

In this material, we will give more technical details as well as additional experiments to support the main paper. The overview of the proposed framework, DeWave, is illustrated in Figure 6. The dataset is split into training (80%), development (10%), and testing (10%) sets, comprising 10,874, 1,387, and 1,387 unique sentences, respectively, with no overlap. We release our implementation code through GitHub to contribute to this area. Section 3.3, where a 6-layer CNN encoder slides through the whole wave and gets the embedding The codex encoder shares the same structure with word-level features.




Russian drone crashes in Polish field as Warsaw protests airspace violation and plans formal complaint

FOX News

Lt. Gen. Keith Kellogg discusses the latest with the Ukraine and Russia war after a deadly Russian attack on'America Reports.' A Russian drone may have crashed in a field in Poland, a move the country's deputy prime minister called a "provocation," as the United States and European leaders continue to push Moscow to end its war in Ukraine. The drone hit a cornfield in the village of Osiny in the eastern Lublin province, about 62 miles from Poland's border with Ukraine, Reuters reported. Deputy Prime Minister Wladyslaw Kosiniak-Kamysz, who also serves as defense minister, said Wednesday's incident was similar to cases in which Russian drones flew into Lithuania and Romania, and could be linked to efforts to end the war in Ukraine, according to the outlet. Polish police secure the area of a cornfield where an unidentified flying object has crashed and exploded in the country's east in Osiny on Wednesday.


TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP

Cai, Yuliang, Thomason, Jesse, Rostami, Mohammad

arXiv.org Artificial Intelligence

Vision-language models (VLMs), such as CLIP, have demonstrated strong performance across a range of downstream tasks. However, CLIP is still limited in negation understanding: the ability to recognize the absence or exclusion of a concept. Existing methods address the problem by using a large language model (LLM) to generate large-scale data of image captions containing negation for further fine-tuning CLIP. However, these methods are both time- and compute-intensive, and their evaluations are typically restricted to image-text matching tasks. To expand the horizon, we (1) introduce a training-time negation data generation pipeline such that negation captions are generated during the training stage, which only increases 2.5% extra training time, and (2) we propose the first benchmark, Neg-TtoI, for evaluating text-to-image generation models on prompts containing negation, assessing model's ability to produce semantically accurate images. We show that our proposed method, TNG-CLIP, achieves SOTA performance on diverse negation benchmarks of image-to-text matching, text-to-image retrieval, and image generation.


WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


From MLP to NeoMLP: Leveraging Self-Attention for Neural Fields

Kofinas, Miltiadis, Papa, Samuele, Gavves, Efstratios

arXiv.org Machine Learning

Neural fields (NeFs) have recently emerged as a state-of-the-art method for encoding spatio-temporal signals of various modalities. Despite the success of NeFs in reconstructing individual signals, their use as representations in downstream tasks, such as classification or segmentation, is hindered by the complexity of the parameter space and its underlying symmetries, in addition to the lack of powerful and scalable conditioning mechanisms. In this work, we draw inspiration from the principles of connectionism to design a new architecture based on MLPs, which we term NeoMLP. We start from an MLP, viewed as a graph, and transform it from a multi-partite graph to a complete graph of input, hidden, and output nodes, equipped with high-dimensional features. We perform message passing on this graph and employ weight-sharing via self-attention among all the nodes. NeoMLP has a built-in mechanism for conditioning through the hidden and output nodes, which function as a set of latent codes, and as such, NeoMLP can be used straightforwardly as a conditional neural field. We demonstrate the effectiveness of our method by fitting high-resolution signals, including multi-modal audio-visual data. Furthermore, we fit datasets of neural representations, by learning instance-specific sets of latent codes using a single backbone architecture, and then use them for downstream tasks, outperforming recent state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neomlp. The omnipresence of neural networks in the last decade has recently given rise to neural fields (NeFs) (cf. Consequently, the popularity of neural fields has spurred interest in neural representations, i.e. using NeFs as representations for a wide range of downstream tasks. Existing neural representations, however, suffer from notable drawbacks.


Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder

Wang, Jiaqi, Song, Zhenxi, Ma, Zhengyu, Qiu, Xipeng, Zhang, Min, Zhang, Zhiguo

arXiv.org Artificial Intelligence

Reconstructing natural language from non-invasive electroencephalography (EEG) holds great promise as a language decoding technology for brain-computer interfaces (BCIs). However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with intra-modality self-reconstruction of EEG features or textual sequences; 2) Under-utilization of large language models (LLMs) to enhance EEG-based language decoding. To address above issues, we propose the Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text through a dedicated multi-stream encoder. Furthermore, we develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations), which leverages pre-trained modules alongside the EEG stream from CET-MAE and further enables an LLM (specifically BART) to decode text from EEG sequences. Comprehensive experiments conducted on the popular text-evoked EEG database, ZuCo, demonstrate the superiority of E2T-PTR, which outperforms the state-of-the-art in ROUGE-1 F1 and BLEU-4 scores by 8.34% and 32.21%, respectively. These results indicate significant advancements in the field and underscores the proposed framework's potential to enable more powerful and widespread BCI applications.


DeformNet: Latent Space Modeling and Dynamics Prediction for Deformable Object Manipulation

Li, Chenchang, Ai, Zihao, Wu, Tong, Li, Xiaosa, Ding, Wenbo, Xu, Huazhe

arXiv.org Artificial Intelligence

Manipulating deformable objects is a ubiquitous task in household environments, demanding adequate representation and accurate dynamics prediction due to the objects' infinite degrees of freedom. This work proposes DeformNet, which utilizes latent space modeling with a learned 3D representation model to tackle these challenges effectively. The proposed representation model combines a PointNet encoder and a conditional neural radiance field (NeRF), facilitating a thorough acquisition of object deformations and variations in lighting conditions. To model the complex dynamics, we employ a recurrent state-space model (RSSM) that accurately predicts the transformation of the latent representation over time. Extensive simulation experiments with diverse objectives demonstrate the generalization capabilities of DeformNet for various deformable object manipulation tasks, even in the presence of previously unseen goals. Finally, we deploy DeformNet on an actual UR5 robotic arm to demonstrate its capability in real-world scenarios.


Gaussian Splashing: Dynamic Fluid Synthesis with Gaussian Splatting

Feng, Yutao, Feng, Xiang, Shang, Yintong, Jiang, Ying, Yu, Chang, Zong, Zeshun, Shao, Tianjia, Wu, Hongzhi, Zhou, Kun, Jiang, Chenfanfu, Yang, Yin

arXiv.org Artificial Intelligence

We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian splatting and position-based dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to Gaussian shader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. For more information, please visit our project page at \url{https://amysteriouscat.github.io/GaussianSplashing/}.